智能论文笔记

A Sketch Is Worth a Thousand Words: Image Retrieval with Text and Sketch

Patsorn Sangkloy , Wittawat Jitkrittum , Diyi Yang , James Hays

分类：计算机视觉 | 机器学习

2022-08-05

我们解决了用草图和文本查询检索图像的问题。我们提出任务形成器（文本和草图变压器），这是一种可使用文本说明和草图作为输入的端到端训练模型。我们认为，两种输入方式都以一种单独的方式无法轻易实现的方式相互补充。任务形成器遵循延迟融合双编码方法，类似于剪辑，该方法允许有效且可扩展的检索，因为检索集可以独立于查询而独立于索引。我们从经验上证明，与传统的基于文本的图像检索相比，除文本外，使用输入草图（甚至是绘制的草图）大大增加了检索召回。为了评估我们的方法，我们在可可数据集的测试集中收集了5,000个手绘草图。收集的草图可获得https://janesjanes.github.io/tsbir/。

translated by 谷歌翻译

Figure 1: We introduce datasets for 3D tracking and motion forecasting with rich maps for autonomous driving. Our 3D tracking dataset contains sequences of LiDAR measurements, 360 • RGB video, front-facing stereo (middle-right), and 6-dof localization. All sequences are aligned with maps containing lane center lines (magenta), driveable region (orange), and ground height. Sequences are annotated with 3D cuboid tracks (green). A wider map view is shown in the bottom-right.

translated by 谷歌翻译